Tolerating First Level Memory Access Latency in High-Performance Systems

نویسندگان

  • William Y. Chen
  • Scott A. Mahlke
  • Wen-mei W. Hwu
چکیده

In order to improve performance, future parallel systems will continue to increase the processing power of each node in a system. As node processors, though, can execute more instructions concurrently, they become more sensitive to the rst level memory access latency. This paper presents a set of hardware and software techniques, collectively referred to as register preloading, to effectively tolerate long rst level memory access latency. The techniques include speculative execution, loop unrolling, dynamic memory disambiguation, and strip-mining. Results show that register preloading provides excellent tolerance to rst level memory access latency up to 16 cycles for an issue 4 node processor.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

TOLERATING FIRST LEVEL MEMORY ACCESS LATENCYIN HIGH - PERFORMANCE SYSTEMSWilliam

In order to improve performance, future parallel systems will continue to increase the processing power of each node in a system. As node processors, though, can execute more instructions concurrently, they become more sensitive to the rst level memory access latency. This paper presents a set of hardware and software techniques , collectively referred to as register preloading, to effectively ...

متن کامل

A performance evaluation of cache injection in bus-based shared memory multiprocessors

Bus-based shared memory multiprocessors with private caches and snooping write-invalidate cache coherence protocols are dominant form of smallto medium-scale parallel machines today. In these systems the high memory latency poses the major hurdle in achieving high performance. One way to cope with this problem is to use various techniques for tolerating high memory latency. Software-controlled ...

متن کامل

Relative Performance of Hardware and Software-Only Directory Protocols Under Latency Tolerating and Reducing Techniques

In both hardware-only and software-only directory protocols the performance is often limited by memory access stall times. To increase the performance, several latency tolerating and reducing techniques have been proposed and shown effective for hardware-only directory protocols. For software-only directory protocols, the efficiency of a technique depends not only on how effective it is as seen...

متن کامل

Effects of Multithreading on Cache Performance

ÐAs the performance gap between processor and memory grows, memory latency becomes a major bottleneck in achieving high processor utilization. Multithreading has emerged as one of the most promising and exciting techniques used to tolerate memory latency by exploiting thread-level parallelism. The question, however, remains as to how effective multithreading is on tolerating memory latency. The...

متن کامل

Low Power System Design by Combining Software Prefetching and Dynamic voltage Scaling

Performance-enhancement techniques improve CPU speed at the cost of other valuable system resources such as power and energy. Software prefetching is one such tech21 nique, tolerating memory latency for high performance. In this article, we quantitatively study this technique’s impact on system performance and power/energy consumption. 23 First, we demonstrate that software prefetching achieves...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1992